Case-Sensitivity of Classifiers for WSD: Complex Systems Disambiguate Tough Words Better

نویسندگان

  • Harri M. T. Saarikoski
  • Steve Legrand
  • Alexander F. Gelbukh
چکیده

We present a novel method for improving disambiguation accuracy by building an optimal ensemble (OE) of systems where we predict the best available system for target word using a priori case factors (e.g. amount of training per sense). We report promising results of a series of best-system prediction tests (best prediction accuracy is 0.92) and show that complex/simple systems disambiguate tough/easy words better. The method provides the following benefits: (1) higher disambiguation accuracy for virtually any base systems (current best OE yields close to 2% accuracy gain over Senseval-3 state of the art) and (2) economical way of building more effective ensembles of all types (e.g. optimal, weighted voting and cross-validation based). The method is also highly scalable in that it utilizes readily available factors available for any ambiguous word in any language for estimating word difficulty and defines classifier complexity using known properties only.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity

Most previous corpus-based algorithms disambiguate a word with a classifier trained from previous usages of the same word. Separate classifiers have to be trained for different words. We present an algorithm that uses the same knowledge sources to disambiguate different words. The algori thm does not require a sense-tagged corpus and exploits the fact that two different words are likely to have...

متن کامل

A Broad-Coverage Word Sense Tagger

In other words, previous corpus-based WSD algorithms learn to disambiguate a polysemous word from previous usages of the same word. This has several undesirable consequences. Firstly, a word must occur thousands of times before a good classifter can be trained. There are thousands of polysemous words, e.g., 11,562 polysemous nouns in WordNet (Miller, 1990). For every polysemous word to occur th...

متن کامل

A Preliminary Study on the Impact of Lexical Concreteness on Word Senses Disambiguation

Psychologists have shown that abstract words are harder to understand and often acquired later than concrete words. In this work, we study how the difficulty of automatic word sense disambiguation (WSD) might be affected by this intrinsic property of words, namely the concreteness of a word and its individual senses. We also explore the feasibility of inducing a numerical index for sense and le...

متن کامل

A Preliminary Study on the Impact of Lexical Concreteness on Word Sense Disambiguation

Psychologists have shown that abstract words are harder to understand and often acquired later than concrete words. In this work, we study how the difficulty of automatic word sense disambiguation (WSD) might be affected by this intrinsic property of words, namely the concreteness of a word and its individual senses. We also explore the feasibility of inducing a numerical index for sense and le...

متن کامل

Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach

In this paper, we present a new approach for word sense disambiguation (WSD) using an exemplar-based learning algorithm. This approach integrates a diverse set of knowledge sources to disambiguate word sense, including part of speech of neighboring words, morphological form, the unordered set of surrounding words, local collocations, and verb-object syntactic relation. We tested our WSD program...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007